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BY X1AOMW ZHANG. 
RSMO R£Dt£ft. AND 
MICHAU POPOVtCH 



A Biointelligence System 
for Identifying Potential 
Disease Outbreaks 

Monitoring Over-the-counter Pharmaceutical Spies 
Data as an Indicator of Changes in Public Health Status 



\ can reasonably be expected thru consumer spending pat- 
terns, such as purchases of over-the-counter r^iannaecuti- 
eals. hold a strung relationship to the. public health status, 
possibly indicating stums changes curlier than traditional 
public health information systems. In the light oTeycal* over 
tbc tost rwo years, such a* ; the anthrax score, the SaRS out- 
break, ami other public health events. over-the-counter (OTC) 
data, collection has received significant attention, *h*t OTC 
project* focus on data collection and basic statistical analysis. 
However, many question* have yet to. be answered. For exam- 
pie: How does the OfFC sales data link to the public health sta- 
ins smd how can this be described in a rigorous framework? 
How can potential disease outbreaks be delected within the 
noise of real- world OTC data? Syndromic surveillance has 
coma to the attention of public health only recently, so how 
can its results be integrated and validated within the existing 
body of public health knowledge? Overall,, thew exists /k> sys- 
tematic framework for ihc potentially automated collection 
and analysis of OTC data for public health management con- 
sumption. This area remains largely unexplored. Small sam- 
pling size,*, significant fluctuations within OTC Aales.data. and 
the lack of evidentiary information: confirming it make OTC 
sale?* surveillance systems challenging and difficult. These dif- 
ficulties are magnified in that a biointelligence system (BIS) 
must become a component of the established public health 
information system infrastruciuit, while requiring an end-to- 
end implementation of advanced information technologies and 
a near rod-time execution. 

The development of public health surveillance systems 
requires multklisciplinary knowledge and advanced technolo- 
gies, Halperin arid Baker j I j provided an excellent summary 
on public health surveillance systems; ami the Centers for 
Disease Control and Prevention f2|. |3] developed new guide- 
lines and reconur>ew^ttk>ns for the evaluation of public health 
surveillance systems. 

"The lessons teamt from the events following September 
.11. 2001. and the subsequent Anthrax attacks'. have proven 
that new and innovative technologies resources arc 
absolutely necessary to ensure the nation is fully prepared" 
f4|. To that end. Scientific Technologies Corp, has devel- 
oped the NH Pharmaceutical Sales Surveillance (NHPSS) 



for the New Hampshire Department of Health and Human 
Services (NH DHHS). The NHPSS was developed as a dis- 
tributed information system. In addition to the database 
server, enterprise application servers and the Web-browser- 
baited user interface, its architecture features Knowledge- 
base technology, a new dynamic, system model with rule 
systems, and automated (lata analysis ! in supporting public 
health surveillance, Internet mapping was also embedded in 
the system to provide for spatial analysis. Since its pilot 
application, started in December of 2002 in the Bureau of 
Communicable Disease Control and Surveillance. NH 
DHHS. ihc NHPSS has assisted NH DHHS in successful 
detect ions of gastrointestinal and respiratory events, 

This- article first ictroducei the methodology and the tech* 
nology in the NHPSS development. Neat* the system func- 
tionalities arc introduced. Then, a new dynamic system inodej 
with a rule system for public health durveitlanee is described 
in detail Filially iLs application and preliminary results* are 
summarised* 

M#mx>ctotogfe* in D«v«toptmnt of NHPSS 

A MutHmGipftnary D&v0fapm*at 

The mutiidisciplinary developroenr of NHPSS draws from five 
system domains, going beyond realtime data compilation; 

1 ) i ntor mat ion technology (conoccti vity); 

2) dam models (adequate spaj y -temporal dimensions); 

3) knowledge-base (domain knowledge and data derived 
knowledge); 

4) aoaiyucal tnethods (dynamic model and algorithms in 
automated processing); 

3) cnterorige applications, 

.Bath of these: domains plays an important role* and a system- 
atic integration has been achieved in NHPSS. This article 
focuses on the data model, knowledge-base technology, and 
analytical methods developed in the NHPSS. 

Data Processing and information Organization 

Massive OTC daily sales data arc collecied at the pharmacy 
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stores. Those sales con be Categorized into medical tons treat* 
• inspiratory- related synd/oinus, gastroimesvinal : related 
syndromes, allergy syndromes, etc. according -m i heir active 
j^firedicnts. The categorized safes data at a store reject the 
public heafth siatus. in this categoryvaroimd thai gtograptii- 
cat urea. The Sales amount is furthermore impacted by the 
population in that area and die convenience to access the ser- 
vice (both ho*pitul service unci- pharmacy service). After the 
purchase, a customer .'can take the medicine lor several days. 
These factors and the spatial and temporal variations in the 
OTC saJcs have been quantitatively reflected in the NHf*SS v 
data model and analytical process, To identify a potential 
outbreak, it is neceisaiy to establish a Set of reference tines 
for OTC sales*. To thot end. a measurement scheme has to be 
defined first. In the NWI'SS. the geographic*]. units «re 
defined as the stoic service area, zip code area, city, and 
-rtaicwitic area. In each geographical anil, the underlying 
population data (possibly with the age groups) can be 
abstracted from census data. A store's service area is derived 
From the driving distance telrwcen the store hikJ its .potential 
customers' homes. Time units used arc daily, weekly* and 
monthly. Data processing and knowledge acquisition will be 
discussed in the following, section. 

firom flow Data to Spatiat Data Warehouse 

After replication of the daily OTC salts data, the raw data arc 
automatically processed along spatial and temporal dimen- 
sions;, In cnnjuncUonwith a rule base, basic statistical methods 
hove been applied, here to derive the reference lines. The 
developed system requires a minimum of one-rnonth historical 
data, while it is fcawrnciuJcd that more than one year of data 
is available to improve the confidence set. Let x l>e the sales 
aiiKxint of a categori/ed medicine for a time unit (e.g., dairy). 
Hits amount will be compared to reference lines (c„g. month- 
ly) derived from historical sales records of the prior /*-years. : 
the specified category syndrome; and die geographical unit. 
The Terence lines include a base line rei^resemmg the regular 
dairy axnoun): and upper reference lines mcorporadn^ the. con- 
fidence levels. Since the reference lines are computed periodi- 
cally ai each geographical level, the seasonal variations arc 
maintained and te spau^l ciiar^icrLstics are captured-. 

The rule system, integrated with the data warehouse 
approach, adapts the automated data processing and handles 
trW. exceptions; Two possible special e.ws rmvc bewveohsid-. 
cred: a) an epidemic outbreak was recorded in the 
histoid of this place, and b) .there may. be- less 
than one year of historical darn. The data ware- 
house organizes the seasonal vwryiug reference 
li ncs at each geographical level. 

It is worth noticing thai a G1S tool w;un tito 
integrated with the developed Spatial ttata ware- 
bouse; The- 01$ derives and organizes the Spa- 
tial background Information, which includes me 
population with age groups. It also performs the 
spatia l -analysis. 

First, a measurement scheme was defined to 
' rpjantilafiveiy and qualitatively evaluate the del- 
ation of the incoming daily data from the refer- 
ence line (to identify Ihe possible abnormality) at 
each place. Next, a sei of algorithms for a struc- 
tural component analysis has boen developed. 
The Incoming OTC data are translbrmed into, the 
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structural components. A mapping of the structural compo- 
nents into d*i : dynamic system model descrihes the change of 
public health status and identifies the p»jssible unusual events. 

A Oynamlc Model for The Pu&ic Health Status 

The dynamic process model in the state-space form was sys- 
tematically torrr^ized by Kalntaru Faib. and Arbib (19681 151. 
Rosettbrock (1970) [6] expended multivariate systems in a 
state-space form, Since then, mathematical system theory, 
modern control engineering, and computer technology have 
.enabled extensive successes in several iintotfrie*. However, 
nonlinear systems in a state-Space form arc much tess well 
understood. In puMie health, Castlllo-Chavezei ai. I?) 
KUitc that "the basic epidemiological equations arc sufficiently 
nonlinear." The dynamic system ukkIc? in state space as a 
means to describe thi» complexity has nut: been mentioned as 
almost not! ring is knuwu about them in the public health con- 
text. This contrasts strongly with the commonly deployed 
intensive Statistical approaches and stochastic cpiitoic mod- 
eling, A. knowledge base with rule system can significantly 
improve decision-making support, because it large class :ot' 
nonlinear functions pm be described there, amla priori knowl- 
edge can be formuiwcit from domain experts or derived from 
the data; Gur effort ha* been to develop an in legated systetn 
that would embrace dynamic system theory, statistical meth- 
ods, and a rule system with knowlcdgo-base tcchnkjues as a 
unified tool, Iris oriented toward problem sol viug in syn- 
dromic sumullancc T but it is a biointelligence framework with 
many possible genera ligations arid implementations. The 
developed system can be implemented in a relatively short 
time, asdemonstruled in the case ofNHPSS, 

Figure I shows the defined states and . stale transition dia- 
gram. The dynamic change of the public health status is mod- 
eled here in a new state-space form. This state-space form 
differs from the conventional state space approaches in that 
Here the state u^imitwn, input mapping, and output, mapping 
arc governed by the rule system, while the conventional state- 
$pacc form Uses CriSp algebra or linear algebra in most caseS- 
With a sutte-spacc notation, at. a Specified piacc, the catcgo* 
ri/cd public hearth status is explicitly (modeled by. a set of state 
variables, . wlucb are varying overtime. Defined, by this model, 
in a specifted pfacc, at a specific time, a categorized, health sta- 
tus is : Ohe of the following; healthy status (5*), critical status 
.(i',.).startihg-unusual status- (67>* upwad-rrend-unusual status 




[ iCH in Mtoot* r\so skxc&v M*.o^rr«: 



Rg. 1 , Tho itotas dhd stdte ttansftbri dioc/rom of th« pubfic hootth status. 



jWOACV/^EWa^Y SOW W 
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system that wcuJd embrace dynamic sy»fe«rn 
theory, statiiticaJ nrvethods, and a rule system with 
knowledge- ba^ techniques as a unified tool- 



(St,), pcak-unusuai statu* 6SJ,), dowiiward*rjcnd-unusunl status 
and ending unusual status tf r >. The state transitions over 
tin)e reflect the dymunic diahgc of the public health status. 

The state space S is defined with its stale variables 
{A i',, S ti , $ p . Sj t SA A validated state transition from 

State Si(ti) to stale 6^*+ J ) determined by the rule system 
that operates in relational algebra on its supporting set Xi(k.y. 
The validated tradition from stnte ,*;(*) to siaic.S><A + I) id 
determined by a rule base which evaJuntes the inputs X f (k) 
at suuc 

Sj(k+ !><= W«).®.*cj 0) 



J) 



Lsa- - fi> J 



■®GiC*)l' 



where 



(2) 



(3) 



(4) 



Bupp<if/) == supp(X„.i) x supp<X„>) x sopp(X^) (5) 



(6) 



In trie defmed system equations, there ore three : transfonried 
components. (4m(*>. W*)IV*hicb are derived from 

the incbnwftg ravtf dara and then mapped' Into the su reporting 
set Ak tim&'&Mateffcit example, a time unit can be daily), 
the state transition from state $ t ik) io-«aw Sj(k •+ i) » deter- 
mined by rtifc r\iie system Ri|.. which evaluates the supporting 
set X/W; as shown, in f U; where $ stands for the inference 
operation* or a rule system operation, which can be logical 
operations or algebra operation* or a hybrid; Equation (1) also 
defines the quantitative measurement for state $ (*)' in that cat- 
egory at the specified ptocc. The coefficient*; can be defined 
by the ^wejr or it can be related lo a threshold value obutined 
from the historical data set. The laiter.appnxich wan taken m 
the NKPSS; implementation to enable automated application 
of very granular rules without subjective input by experts. 

Equation (2) describes thai, at a state S t ik), there is the sup- 
porting set Xi(k) with three structural component}! whose 



respective tiireshold* Mft.ftW.Mk)) can be incorporated. 
The rule system mar* the conipwiems into the. supporting 
seiXf*). 

Equation (3) describes the output mapping, which interprets 
ihe outputs from a .set of suites or a state histor> r '^ith Che spec- 
ified weight for the states by ly&f.ty.yiC*) <Y*f.toi- In 

addition, the rule system combines the background informa- 
tion Gj, such as the environmental factors with ihe poputaiton 
dcrtK^ruphks m the study area. 

■ Equation (4) attd (5) define the supporting sysicm X as an 
additive combination of supporting a*»* allowing, inputs 
to be comprised of multiple data sources. In the NHPSS eval- 
uation rjeriod, data.sourees included the 6TC sales data, emer- 
gency department encounters, school closure events, and case 
count data, 

Etroaiioo <6) defines that the value of an output is a combi- 
nalkm of the l&elihood index of ahnormality (L,, A ), the trend 
indicator {T ith )> and the potential intact, index (fyi). An 
exemplary set has been deri ned. here as: 

&ih ' medium, high)). 

\Tik : (stable, upward, downward)), 

[fit, : (minor, moderate, simftcani)^ 

Consider the wmple case where >\ ™ {U2.Tt±Ayh This eaee 
ret>resents u medium likelihood abnormaJHy, with upward trend 
twanK and pf>$siWe signifk^inr ptjtentiitl inH>aa, In reality, this 
wtuatkm might reqti he specified extensive management: 

Knowfe0Q0-Bos0 Technique and Rut* Systems 

A knowledge base could derive its information from data 
sets. Fensel and Studer (IW) [8J provkle a comprehensive 
description wt the application of knowledge acciuisition arid 
management. In NHPSS. a knowledge base compiles the 
incoming raw data into the designed form*. Next, it derives 
the relational facts, temporal characteristics* and rcgiotRd pai- 
fems. Data processing methods include statistical analysis 
oyer «paee ami rime, The knowledge base organi/.es llw infor- 
tnatioii st»cli that ijuerteN ix evaluations posed to the knowl- 
edge base can be answered by means of an inference-based 
n*icr>Mhcn-a«sweTmg operation, or alternatively, au automat* 
ed operation on evwhtation ami reHp<)ii<es.. l.Juring the devel- 
opment nf NHPSS» G\S was integrated with the knowledge 
base. With knnwled&c-t>ase redmology, large data sets arc 
processed along temporal and spatial dimension*. 
Information is derived 10 cteiracterize the spaiial distribution 
over time. Then the spatial data warehouse organizes the 
knowledge and Hs derivative information hierarchically by 
spatial areas. Figure 2 illustrates a spaitocmporal knowledge 
ttcquifiirion by deriving tiie seasonally varying rderenec Jines, 
ihe trends, the extreme values, and the clusters for the OTC 
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sales in geographical dimensions for the categoric disease 
syndromes in a local area incorporating, regional arid .state- 
wide information, 

A. rule system is a set of rules, arguments, constraints, f ela- 
tion*;, and responses. A rule can be numerical, lexical, or both. 
A hybrid rule system consists of both explicit functions and 
logical rules. Bardossy and Duck stein (1995) 19] have an 
exccllcnr introduction to rule systems. The rule system in 
NHPSS is a hybrid role system that was developed for auto* 
maced operations of the BIS. The role system is implemented 
with a set of decision matrices. The developed rule *ystem 
consists of sets of logical rules. Combining statistical analyst* 
and epidemiology knowledge, the rule system evaluate*? the 
decision matrices and produces the 
responses. Examples are iluj compari- 
son of the incoming data with the set of 
references. Differences are then quanti- 
tatively and qualitatively evaluated with 
respect to the space -time dimensions. 
The abaormaHties of me OTC medicine 
sates arc identified and assessed using 
retatioDal algebra, relational calculus, 
and classical calculus. Figure 3 illus- 
trate* trie rule system approach. Figure 
4 shows the integration of the spatial 
data warehouse Jcnowtedgc-basc tech- 
niques with rule system to support die 
automated analysis and reporting. 



irttptemftntetion of NHPSS 

NMPSS was implemented as a distrib- 
uted information system. Figure 5 
shows the NHPSS system structure. 
Daily OTC phannaceuticu! .sales data 
are collected at each store, reanrded at 
the pharmacy chain hetickjuariers f and 
inms-mitied to .NH DHHS, These data 
are replicated in data servers at the state 
public health department. 'The devel- 
oped BIS dam warehouse organises 
data along logical dimensions. Next, 
automated dala : processing occurs in the 
application ftervers:. Finally, analyses, 
reports, and alerts (,if necessary) are 
generated to assist the decision making 
process of public health management 
The u*er interface is Internet browser- 
based With, secured access, user* can 
browse the data, search the reports and 
maps. and. review the results of the 
trend analyses and unusual event detec- 
tion nxHTHxb as created by the built-in 
rule -base. 

GIS plays a key role. in the B(S spa- 
tial data warehouse with knowledge 
acquisition as was introduced abbye. It 
also pctfonns the spatial analysis, such 
as abaormaJity analyses with scanning 
and ranking, Furthermore. GIS sup- 
ports the outputs mapping for risk 
assessment as well as provides com- 
prehensive reports wub.posslblc alerts. 



Figure 6 depicts the integration of GIS for knowledge acqui- 
sition and OTC analysis in NHPSS. 

The main syndromic surveitlance funerions Of tpe NHPSS 
BIS implementation can be summarized as: 

(J) Automated dttta capture of OTC pharmaceutical sales 
data: 

> Approximately 300 different pharmaceutical items are cur- 
rently categorized into Gastrointestinal Diseases and 
Respiratory Illnesses. 

> The measurement unit of disease indicators can.be the daily 
number of sold packages (NHPSS), or the amount of sold 
active ingredient 



OteearoSur^ Acquisition 




Rq. 2. Spartoteo^pofol Kr^wioctQecicqoi^o" scheme m NHPSS. 





fig. 3. Illustration of tfto nJO system in NHPSS: 
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(2) Automated fttvcesAing for data-derived reference lines 
(from historical data): 

> Central Unei monthly-averted tor weekly-averaged) 
daily sales 

> Control lines: Miru Max., N-sigma lines and Confidence 
Interv al Upper Limits, 

(J J Analysis arid Reporting in Tun* Dimensions: 
*- Detailed or aggregated reporting iu daily/weeXty/monthly 
for the selected place, with capability of cornpitfiHOn to the 
historical cfctfa, 

(4) Analysis and Reporting in Geographical Areas: 
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> Map display with alerting capability for die specified lime i 
aiKl<ii!k:aAC.if)dicalor«.. i. 

. > Pinpoint the unusual areas, 

(5) Rute-based trend Analysis ohd Event Detections: | 

> detection of an omtt'uaV singlc-poini-vaiue eveni by com- j 
pari son to the control lines I 

> detection Of clwacrs. amleariy warning of cluster-drifting j 
»• detection and early warning of weekJy average shifting 

> detection and early warning of potential trend. shifting 
>- detection of starting daic, peak, ami ending dale of an event. 

Figure ? shows NHPSS hierarchical derision support, step 
by step, from a time series tdcit at the state level to pinpoint- 
ing the unusual local arena and its 
detailed reports. 



3 



Rg. 4. integration ot spotlot dot.o workhouse- taowtedoe base, and rule system in 
NHPSS, 



Pilot Application of NHPSS 

The pilot application of the . NHPSS BIS 
started in December. 2002 ai the Bureau 
of Commurucablc Disease Control and 
Surveillance (&CD'CS), NB DHHS. 
Pharmacy stores in 23 cities reported 
daily OTC soJes for a select set of phar- 
raaceuticats to NH DHHS. The purtici- 
patins pharmacy, stores represent 
approjeimatcry 10% of ail stores m NH 
statewide m(\ about 30% in the major 
cKicR. Since then, NHPSS has succcss- 
rully supported BCDCS in detecting a 
J urge -scale gastrointestinal disease out- 
break at the end of 2002 at both state 
and local levels, and u major influenza 
outbreak in February of 2003. The 
NHPSS output, was compared to nospi. 



r 



OTC Stoca Daily Daia Pharmacy Chain Data 



Automated OTC PK/Syndromic Surwillanca System 

Stai with Hula-Baaed Atehlng 

kitejrttt Mapping 




State PHD 
Supported by STC 



OTC PS H 



Pharmracy HQ2 



R9. 3. Okjgrom of the NHPSS system structure. 



62 ttfc LNGfftaW*3 IM MtOtCir*. AND WOLOG' MAGA2N!; 



"f 
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tat ER counts, case counts, and laboratory information, and 
some school closure events. All data sources wore at the same 
lime period in the same area. The hospital ER data covers 
about 60% of total liR visiting in New Hampshire statewide. 
For the gastrointestinal disease outbreak, the NHPSS triggered 
an aiert tour days prior to the recognition of the OMlbrc^k nt the 
staie te'vcL In local areas, such as communities, it has provided 
alert* up to ten days early. Furihermorc..for some cities, 
NHPSS has provided alerts up to 12 days prior to the closure 
of schools, combating the spread of the influcnxa outbreak. 
Equally important is that the spatial and temporal chnnieleris- 
tics of the outbreaks can be reported by the embetkted Internet 
CIS application. Figure 8 displays the analysis results of 
NHPSS outputs, with CIS tool scanning and ranking the 
abrKirmality in support of risk assessment. Several other less 
impsctfve localized and stale-wide events have been detected 
and described by the NHPSS 0IS. 

Conckafortt 

Developing syndromic survciJIartce systems to support early 
warning of both natural and possibly intentional public health 
events has received significant attention in the past two years* 
This articJe has introduced a bioiiiteHtgeace system thai ime- 
grates advanced InforimtkKiiechnology with dynamic system 
Uteory to support the early detection of potential disease out- 
breaks. A new dycamie mode) was developed in the slate* 
space form to describe- the change of public health status 
incorporating multiple sources of syndrdnuc surveillance data: 
A measurernent scheme for the spatially and temporally vary- 
ing time series data was developed. Spatial data warehouse 
and knowledge-base techniques are integrated wkh uuionititcd 
data processing and automated knowledge acquisition. A 
eorrtpttoion rule System gOVCmS the State transitions and sup- 
ports Ihc event detection. The implemented system, NHPSS, 
has Internet GIS to support spatial analysis and decision- mak- 
ing, as well as a coinmercial-otY-the-shelf Web-based report- 
ing tool. The pilot application has yielded promising resultB. 
Beyond the merits of the developed system for the OTC saJes 
surveilJaoce, the framework has been generalized Tor multiple 
data source*. This system irt still in fbc early stage, but the pre- 
liminary results already it ail to chal- 
Jenge the traditional -.methods in their 
own area of excellence or where they 
cannot be applied. 

NHPSS has detw>nstrated thai public 
health managejnent can greatly benefit 
from early warnings for disease out- 
breaks through the implementation of 
automated syndromic surveillance. The 
largest rematrxtng challenge is the coop- 
eration of multiple organisations and 
the private sectors in storing rhe infor- 
mation for the common gou! while pre- 
serving confidentiality of proprietary 
interests such &h market pcneioUitin by 
individual pharmacy chains. 
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